Overview

Dataset statistics

Number of variables21
Number of observations150460
Missing cells431135
Missing cells (%)13.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.1 MiB
Average record size in memory168.0 B

Variable types

Numeric8
Categorical13

Warnings

batsman has a high cardinality: 461 distinct values High cardinality
non_striker has a high cardinality: 458 distinct values High cardinality
bowler has a high cardinality: 356 distinct values High cardinality
player_dismissed has a high cardinality: 436 distinct values High cardinality
fielder has a high cardinality: 451 distinct values High cardinality
wide_runs is highly correlated with extra_runsHigh correlation
legbye_runs is highly correlated with extra_runsHigh correlation
batsman_runs is highly correlated with total_runsHigh correlation
extra_runs is highly correlated with wide_runs and 1 other fieldsHigh correlation
total_runs is highly correlated with batsman_runsHigh correlation
wide_runs is highly correlated with extra_runsHigh correlation
legbye_runs is highly correlated with extra_runsHigh correlation
batsman_runs is highly correlated with total_runsHigh correlation
extra_runs is highly correlated with wide_runs and 1 other fieldsHigh correlation
total_runs is highly correlated with batsman_runsHigh correlation
wide_runs is highly correlated with extra_runsHigh correlation
legbye_runs is highly correlated with extra_runsHigh correlation
batsman_runs is highly correlated with total_runsHigh correlation
extra_runs is highly correlated with wide_runs and 1 other fieldsHigh correlation
total_runs is highly correlated with batsman_runsHigh correlation
inning is highly correlated with is_super_overHigh correlation
total_runs is highly correlated with batsman_runs and 3 other fieldsHigh correlation
penalty_runs is highly correlated with extra_runsHigh correlation
bowling_team is highly correlated with match_idHigh correlation
match_id is highly correlated with bowling_team and 1 other fieldsHigh correlation
batsman_runs is highly correlated with total_runs and 1 other fieldsHigh correlation
is_super_over is highly correlated with inningHigh correlation
extra_runs is highly correlated with total_runs and 3 other fieldsHigh correlation
batting_team is highly correlated with match_idHigh correlation
wide_runs is highly correlated with total_runs and 1 other fieldsHigh correlation
dismissal_kind is highly correlated with total_runs and 1 other fieldsHigh correlation
legbye_runs is highly correlated with extra_runsHigh correlation
inning is highly correlated with is_super_overHigh correlation
bye_runs is highly correlated with dismissal_kindHigh correlation
is_super_over is highly correlated with inningHigh correlation
penalty_runs is highly correlated with dismissal_kindHigh correlation
dismissal_kind is highly correlated with bye_runs and 1 other fieldsHigh correlation
player_dismissed has 143022 (95.1%) missing values Missing
dismissal_kind has 143022 (95.1%) missing values Missing
fielder has 145091 (96.4%) missing values Missing
wide_runs has 145885 (97.0%) zeros Zeros
legbye_runs has 147870 (98.3%) zeros Zeros
batsman_runs has 61148 (40.6%) zeros Zeros
extra_runs has 142264 (94.6%) zeros Zeros
total_runs has 53305 (35.4%) zeros Zeros

Reproduction

Analysis started2021-07-08 18:35:37.912975
Analysis finished2021-07-08 18:36:28.468099
Duration50.56 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

match_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct636
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean318.2813173
Minimum1
Maximum636
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:28.688792image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile32
Q1161
median319
Q3476
95-th percentile604
Maximum636
Range635
Interquartile range (IQR)315

Descriptive statistics

Standard deviation182.955531
Coefficient of variation (CV)0.5748233437
Kurtosis-1.191121596
Mean318.2813173
Median Absolute Deviation (MAD)158
Skewness-0.0002120242534
Sum47888607
Variance33472.72634
MonotonicityIncreasing
2021-07-09T00:06:28.951835image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126267
 
0.2%
34263
 
0.2%
534262
 
0.2%
476262
 
0.2%
388261
 
0.2%
190259
 
0.2%
570259
 
0.2%
401258
 
0.2%
536258
 
0.2%
257257
 
0.2%
Other values (626)147854
98.3%
ValueCountFrequency (%)
1248
0.2%
2247
0.2%
3218
0.1%
4247
0.2%
5248
0.2%
6216
0.1%
7254
0.2%
8212
0.1%
9226
0.2%
10239
0.2%
ValueCountFrequency (%)
636250
0.2%
635242
0.2%
634244
0.2%
633240
0.2%
632236
0.2%
631247
0.2%
630236
0.2%
629250
0.2%
628247
0.2%
627214
0.1%

inning
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1
78029 
2
72350 
3
 
43
4
 
38

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150460
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
178029
51.9%
272350
48.1%
343
 
< 0.1%
438
 
< 0.1%

Length

2021-07-09T00:06:29.483969image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-09T00:06:29.622040image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
178029
51.9%
272350
48.1%
343
 
< 0.1%
438
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
178029
51.9%
272350
48.1%
343
 
< 0.1%
438
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number150460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
178029
51.9%
272350
48.1%
343
 
< 0.1%
438
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common150460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
178029
51.9%
272350
48.1%
343
 
< 0.1%
438
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII150460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
178029
51.9%
272350
48.1%
343
 
< 0.1%
438
 
< 0.1%

batting_team
Categorical

HIGH CORRELATION

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
MI
18943 
RCB
17678 
KXIP
17594 
KKR
17229 
DD
17185 
Other values (9)
61831 

Length

Max length4
Median length2
Mean length2.604220391
Min length2

Characters and Unicode

Total characters391831
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSH
2nd rowSH
3rd rowSH
4th rowSH
5th rowSH

Common Values

ValueCountFrequency (%)
MI18943
12.6%
RCB17678
11.7%
KXIP17594
11.7%
KKR17229
11.5%
DD17185
11.4%
CSK15754
10.5%
RR13914
9.2%
SH9058
6.0%
DC9034
6.0%
PW5443
 
3.6%
Other values (4)8628
5.7%

Length

2021-07-09T00:06:30.277472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mi18943
12.6%
rcb17678
11.7%
kxip17594
11.7%
kkr17229
11.5%
dd17185
11.4%
csk15754
10.5%
rr13914
9.2%
sh9058
6.0%
dc9034
6.0%
pw5443
 
3.6%
Other values (4)8628
5.7%

Most occurring characters

ValueCountFrequency (%)
K70970
18.1%
R66215
16.9%
D43404
11.1%
C42466
10.8%
I36537
9.3%
P26517
 
6.8%
S26392
 
6.7%
M18943
 
4.8%
B17678
 
4.5%
X17594
 
4.5%
Other values (5)25115
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter391831
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K70970
18.1%
R66215
16.9%
D43404
11.1%
C42466
10.8%
I36537
9.3%
P26517
 
6.8%
S26392
 
6.7%
M18943
 
4.8%
B17678
 
4.5%
X17594
 
4.5%
Other values (5)25115
 
6.4%

Most occurring scripts

ValueCountFrequency (%)
Latin391831
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
K70970
18.1%
R66215
16.9%
D43404
11.1%
C42466
10.8%
I36537
9.3%
P26517
 
6.8%
S26392
 
6.7%
M18943
 
4.8%
B17678
 
4.5%
X17594
 
4.5%
Other values (5)25115
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII391831
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K70970
18.1%
R66215
16.9%
D43404
11.1%
C42466
10.8%
I36537
9.3%
P26517
 
6.8%
S26392
 
6.7%
M18943
 
4.8%
B17678
 
4.5%
X17594
 
4.5%
Other values (5)25115
 
6.4%

bowling_team
Categorical

HIGH CORRELATION

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
MI
18879 
RCB
17920 
KKR
17411 
KXIP
17392 
DD
17099 
Other values (9)
61759 

Length

Max length4
Median length2
Mean length2.603708627
Min length2

Characters and Unicode

Total characters391754
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRCB
2nd rowRCB
3rd rowRCB
4th rowRCB
5th rowRCB

Common Values

ValueCountFrequency (%)
MI18879
12.5%
RCB17920
11.9%
KKR17411
11.6%
KXIP17392
11.6%
DD17099
11.4%
CSK15562
10.3%
RR14111
9.4%
DC9039
6.0%
SH8888
5.9%
PW5457
 
3.6%
Other values (4)8702
5.8%

Length

2021-07-09T00:06:30.768131image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mi18879
12.5%
rcb17920
11.9%
kkr17411
11.6%
kxip17392
11.6%
dd17099
11.4%
csk15562
10.3%
rr14111
9.4%
dc9039
6.0%
sh8888
5.9%
pw5457
 
3.6%
Other values (4)8702
5.8%

Most occurring characters

ValueCountFrequency (%)
K71004
18.1%
R67096
17.1%
D43237
11.0%
C42521
10.9%
I36271
9.3%
P26392
 
6.7%
S26065
 
6.7%
M18879
 
4.8%
B17920
 
4.6%
X17392
 
4.4%
Other values (5)24977
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter391754
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K71004
18.1%
R67096
17.1%
D43237
11.0%
C42521
10.9%
I36271
9.3%
P26392
 
6.7%
S26065
 
6.7%
M18879
 
4.8%
B17920
 
4.6%
X17392
 
4.4%
Other values (5)24977
 
6.4%

Most occurring scripts

ValueCountFrequency (%)
Latin391754
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
K71004
18.1%
R67096
17.1%
D43237
11.0%
C42521
10.9%
I36271
9.3%
P26392
 
6.7%
S26065
 
6.7%
M18879
 
4.8%
B17920
 
4.6%
X17392
 
4.4%
Other values (5)24977
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII391754
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K71004
18.1%
R67096
17.1%
D43237
11.0%
C42521
10.9%
I36271
9.3%
P26392
 
6.7%
S26065
 
6.7%
M18879
 
4.8%
B17920
 
4.6%
X17392
 
4.4%
Other values (5)24977
 
6.4%

over
Real number (ℝ≥0)

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.14264921
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:30.999944image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median10
Q315
95-th percentile19
Maximum20
Range19
Interquartile range (IQR)10

Descriptive statistics

Standard deviation5.674337781
Coefficient of variation (CV)0.5594532221
Kurtosis-1.181196948
Mean10.14264921
Median Absolute Deviation (MAD)5
Skewness0.05352487634
Sum1526063
Variance32.19810925
MonotonicityNot monotonic
2021-07-09T00:06:31.184863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
18092
 
5.4%
28018
 
5.3%
37932
 
5.3%
47902
 
5.3%
57874
 
5.2%
67865
 
5.2%
77827
 
5.2%
87799
 
5.2%
97775
 
5.2%
107726
 
5.1%
Other values (10)71650
47.6%
ValueCountFrequency (%)
18092
5.4%
28018
5.3%
37932
5.3%
47902
5.3%
57874
5.2%
67865
5.2%
77827
5.2%
87799
5.2%
97775
5.2%
107726
5.1%
ValueCountFrequency (%)
205649
3.8%
196543
4.3%
186979
4.6%
177234
4.8%
167332
4.9%
157457
5.0%
147536
5.0%
137615
5.1%
127637
5.1%
117668
5.1%

ball
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.616482786
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:31.385374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile6
Maximum9
Range8
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.807697644
Coefficient of variation (CV)0.499849647
Kurtosis-1.081816833
Mean3.616482786
Median Absolute Deviation (MAD)2
Skewness0.0968573607
Sum544136
Variance3.267770773
MonotonicityNot monotonic
2021-07-09T00:06:31.551298image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
124397
16.2%
224330
16.2%
324261
16.1%
424202
16.1%
524123
16.0%
624041
16.0%
74324
 
2.9%
8679
 
0.5%
9103
 
0.1%
ValueCountFrequency (%)
124397
16.2%
224330
16.2%
324261
16.1%
424202
16.1%
524123
16.0%
624041
16.0%
74324
 
2.9%
8679
 
0.5%
9103
 
0.1%
ValueCountFrequency (%)
9103
 
0.1%
8679
 
0.5%
74324
 
2.9%
624041
16.0%
524123
16.0%
424202
16.1%
324261
16.1%
224330
16.2%
124397
16.2%

batsman
Categorical

HIGH CARDINALITY

Distinct461
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
V Kohli
 
3494
G Gambhir
 
3433
SK Raina
 
3369
RG Sharma
 
3274
S Dhawan
 
3005
Other values (456)
133885 

Length

Max length20
Median length9
Mean length9.389651735
Min length5

Characters and Unicode

Total characters1412767
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowDA Warner
2nd rowDA Warner
3rd rowDA Warner
4th rowDA Warner
5th rowDA Warner

Common Values

ValueCountFrequency (%)
V Kohli3494
 
2.3%
G Gambhir3433
 
2.3%
SK Raina3369
 
2.2%
RG Sharma3274
 
2.2%
S Dhawan3005
 
2.0%
RV Uthappa2960
 
2.0%
DA Warner2902
 
1.9%
MS Dhoni2680
 
1.8%
AM Rahane2602
 
1.7%
CH Gayle2532
 
1.7%
Other values (451)120209
79.9%

Length

2021-07-09T00:06:32.093445image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
v5405
 
1.8%
s5250
 
1.7%
singh4259
 
1.4%
sharma4061
 
1.3%
da4037
 
1.3%
sr3993
 
1.3%
m3988
 
1.3%
smith3827
 
1.2%
sk3649
 
1.2%
ms3526
 
1.1%
Other values (639)265846
86.4%

Most occurring characters

ValueCountFrequency (%)
157381
 
11.1%
a156425
 
11.1%
i69544
 
4.9%
h65270
 
4.6%
n63108
 
4.5%
r62853
 
4.4%
e57463
 
4.1%
S55490
 
3.9%
l52188
 
3.7%
M38721
 
2.7%
Other values (44)634324
44.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter822653
58.2%
Uppercase Letter432516
30.6%
Space Separator157381
 
11.1%
Dash Punctuation217
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S55490
12.8%
M38721
 
9.0%
R35347
 
8.2%
A33973
 
7.9%
K33837
 
7.8%
D30840
 
7.1%
P28741
 
6.6%
J21843
 
5.1%
G21412
 
5.0%
V20120
 
4.7%
Other values (16)112192
25.9%
Lowercase Letter
ValueCountFrequency (%)
a156425
19.0%
i69544
 
8.5%
h65270
 
7.9%
n63108
 
7.7%
r62853
 
7.6%
e57463
 
7.0%
l52188
 
6.3%
s36770
 
4.5%
o30268
 
3.7%
t30211
 
3.7%
Other values (16)198553
24.1%
Space Separator
ValueCountFrequency (%)
157381
100.0%
Dash Punctuation
ValueCountFrequency (%)
-217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1255169
88.8%
Common157598
 
11.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a156425
 
12.5%
i69544
 
5.5%
h65270
 
5.2%
n63108
 
5.0%
r62853
 
5.0%
e57463
 
4.6%
S55490
 
4.4%
l52188
 
4.2%
M38721
 
3.1%
s36770
 
2.9%
Other values (42)597337
47.6%
Common
ValueCountFrequency (%)
157381
99.9%
-217
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1412767
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
157381
 
11.1%
a156425
 
11.1%
i69544
 
4.9%
h65270
 
4.6%
n63108
 
4.5%
r62853
 
4.4%
e57463
 
4.1%
S55490
 
3.9%
l52188
 
3.7%
M38721
 
2.7%
Other values (44)634324
44.9%

non_striker
Categorical

HIGH CARDINALITY

Distinct458
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
G Gambhir
 
3635
SK Raina
 
3483
V Kohli
 
3351
RG Sharma
 
3306
S Dhawan
 
3248
Other values (453)
133437 

Length

Max length20
Median length9
Mean length9.390522398
Min length5

Characters and Unicode

Total characters1412898
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowS Dhawan
2nd rowS Dhawan
3rd rowS Dhawan
4th rowS Dhawan
5th rowS Dhawan

Common Values

ValueCountFrequency (%)
G Gambhir3635
 
2.4%
SK Raina3483
 
2.3%
V Kohli3351
 
2.2%
RG Sharma3306
 
2.2%
S Dhawan3248
 
2.2%
RV Uthappa2848
 
1.9%
AM Rahane2831
 
1.9%
DA Warner2672
 
1.8%
CH Gayle2458
 
1.6%
MS Dhoni2432
 
1.6%
Other values (448)120196
79.9%

Length

2021-07-09T00:06:32.565971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
s5401
 
1.8%
v5389
 
1.8%
sr4231
 
1.4%
sharma4174
 
1.4%
m4118
 
1.3%
singh3992
 
1.3%
da3788
 
1.2%
sk3733
 
1.2%
g3635
 
1.2%
gambhir3635
 
1.2%
Other values (639)265699
86.3%

Most occurring characters

ValueCountFrequency (%)
a157577
 
11.2%
157335
 
11.1%
i69245
 
4.9%
h64890
 
4.6%
n63332
 
4.5%
r62581
 
4.4%
e58122
 
4.1%
S55327
 
3.9%
l51673
 
3.7%
M39326
 
2.8%
Other values (44)633490
44.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter822709
58.2%
Uppercase Letter432613
30.6%
Space Separator157335
 
11.1%
Dash Punctuation241
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S55327
12.8%
M39326
 
9.1%
R35728
 
8.3%
A33851
 
7.8%
K33733
 
7.8%
D30214
 
7.0%
P28558
 
6.6%
J21786
 
5.0%
G21613
 
5.0%
V20273
 
4.7%
Other values (16)112204
25.9%
Lowercase Letter
ValueCountFrequency (%)
a157577
19.2%
i69245
 
8.4%
h64890
 
7.9%
n63332
 
7.7%
r62581
 
7.6%
e58122
 
7.1%
l51673
 
6.3%
s36717
 
4.5%
u30170
 
3.7%
t29739
 
3.6%
Other values (16)198663
24.1%
Space Separator
ValueCountFrequency (%)
157335
100.0%
Dash Punctuation
ValueCountFrequency (%)
-241
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1255322
88.8%
Common157576
 
11.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a157577
 
12.6%
i69245
 
5.5%
h64890
 
5.2%
n63332
 
5.0%
r62581
 
5.0%
e58122
 
4.6%
S55327
 
4.4%
l51673
 
4.1%
M39326
 
3.1%
s36717
 
2.9%
Other values (42)596532
47.5%
Common
ValueCountFrequency (%)
157335
99.8%
-241
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1412898
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a157577
 
11.2%
157335
 
11.1%
i69245
 
4.9%
h64890
 
4.6%
n63332
 
4.5%
r62581
 
4.4%
e58122
 
4.1%
S55327
 
3.9%
l51673
 
3.7%
M39326
 
2.8%
Other values (44)633490
44.8%

bowler
Categorical

HIGH CARDINALITY

Distinct356
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Harbhajan Singh
 
2989
A Mishra
 
2703
SL Malinga
 
2694
P Kumar
 
2637
PP Chawla
 
2594
Other values (351)
136843 

Length

Max length17
Median length9
Mean length9.468669414
Min length5

Characters and Unicode

Total characters1424656
Distinct characters55
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowTS Mills
2nd rowTS Mills
3rd rowTS Mills
4th rowTS Mills
5th rowTS Mills

Common Values

ValueCountFrequency (%)
Harbhajan Singh2989
 
2.0%
A Mishra2703
 
1.8%
SL Malinga2694
 
1.8%
P Kumar2637
 
1.8%
PP Chawla2594
 
1.7%
R Ashwin2359
 
1.6%
Z Khan2276
 
1.5%
R Vinay Kumar2161
 
1.4%
DW Steyn2159
 
1.4%
IK Pathan2113
 
1.4%
Other values (346)125775
83.6%

Length

2021-07-09T00:06:33.097905image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
r8717
 
2.8%
singh8667
 
2.8%
sharma8066
 
2.6%
a7957
 
2.6%
kumar6883
 
2.2%
m5046
 
1.6%
s4647
 
1.5%
p4561
 
1.5%
pp4539
 
1.5%
sk3704
 
1.2%
Other values (515)243632
79.5%

Most occurring characters

ValueCountFrequency (%)
a180198
 
12.6%
155959
 
10.9%
n77539
 
5.4%
r75955
 
5.3%
h69754
 
4.9%
i63294
 
4.4%
e61015
 
4.3%
S56017
 
3.9%
l47071
 
3.3%
M39833
 
2.8%
Other values (45)598021
42.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter863759
60.6%
Uppercase Letter404277
28.4%
Space Separator155959
 
10.9%
Dash Punctuation586
 
< 0.1%
Open Punctuation25
 
< 0.1%
Decimal Number25
 
< 0.1%
Close Punctuation25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a180198
20.9%
n77539
9.0%
r75955
 
8.8%
h69754
 
8.1%
i63294
 
7.3%
e61015
 
7.1%
l47071
 
5.4%
t34150
 
4.0%
o33277
 
3.9%
m31227
 
3.6%
Other values (16)190279
22.0%
Uppercase Letter
ValueCountFrequency (%)
S56017
13.9%
M39833
9.9%
P36269
 
9.0%
A35582
 
8.8%
K29790
 
7.4%
R28241
 
7.0%
J25776
 
6.4%
B21373
 
5.3%
D19161
 
4.7%
C15412
 
3.8%
Other values (14)96823
23.9%
Space Separator
ValueCountFrequency (%)
155959
100.0%
Dash Punctuation
ValueCountFrequency (%)
-586
100.0%
Open Punctuation
ValueCountFrequency (%)
(25
100.0%
Decimal Number
ValueCountFrequency (%)
225
100.0%
Close Punctuation
ValueCountFrequency (%)
)25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1268036
89.0%
Common156620
 
11.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a180198
 
14.2%
n77539
 
6.1%
r75955
 
6.0%
h69754
 
5.5%
i63294
 
5.0%
e61015
 
4.8%
S56017
 
4.4%
l47071
 
3.7%
M39833
 
3.1%
P36269
 
2.9%
Other values (40)561091
44.2%
Common
ValueCountFrequency (%)
155959
99.6%
-586
 
0.4%
(25
 
< 0.1%
225
 
< 0.1%
)25
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1424656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a180198
 
12.6%
155959
 
10.9%
n77539
 
5.4%
r75955
 
5.3%
h69754
 
4.9%
i63294
 
4.4%
e61015
 
4.3%
S56017
 
3.9%
l47071
 
3.3%
M39833
 
2.8%
Other values (45)598021
42.0%

is_super_over
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
150379 
1
 
81

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150460
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0150379
99.9%
181
 
0.1%

Length

2021-07-09T00:06:33.493029image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-09T00:06:33.613945image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0150379
99.9%
181
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0150379
99.9%
181
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number150460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0150379
99.9%
181
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common150460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0150379
99.9%
181
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII150460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0150379
99.9%
181
 
0.1%

wide_runs
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.03749833843
Minimum0
Maximum5
Zeros145885
Zeros (%)97.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:33.714684image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2573975398
Coefficient of variation (CV)6.864238538
Kurtosis189.3453059
Mean0.03749833843
Median Absolute Deviation (MAD)0
Skewness11.68075305
Sum5642
Variance0.06625349347
MonotonicityNot monotonic
2021-07-09T00:06:33.883802image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0145885
97.0%
14130
 
2.7%
2211
 
0.1%
5192
 
0.1%
338
 
< 0.1%
44
 
< 0.1%
ValueCountFrequency (%)
0145885
97.0%
14130
 
2.7%
2211
 
0.1%
338
 
< 0.1%
44
 
< 0.1%
5192
 
0.1%
ValueCountFrequency (%)
5192
 
0.1%
44
 
< 0.1%
338
 
< 0.1%
2211
 
0.1%
14130
 
2.7%
0145885
97.0%

bye_runs
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
150048 
1
 
285
4
 
97
2
 
28
3
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150460
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0150048
99.7%
1285
 
0.2%
497
 
0.1%
228
 
< 0.1%
32
 
< 0.1%

Length

2021-07-09T00:06:34.299162image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-09T00:06:34.420048image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0150048
99.7%
1285
 
0.2%
497
 
0.1%
228
 
< 0.1%
32
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0150048
99.7%
1285
 
0.2%
497
 
0.1%
228
 
< 0.1%
32
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number150460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0150048
99.7%
1285
 
0.2%
497
 
0.1%
228
 
< 0.1%
32
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common150460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0150048
99.7%
1285
 
0.2%
497
 
0.1%
228
 
< 0.1%
32
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII150460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0150048
99.7%
1285
 
0.2%
497
 
0.1%
228
 
< 0.1%
32
 
< 0.1%

legbye_runs
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02223182241
Minimum0
Maximum5
Zeros147870
Zeros (%)98.3%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:34.571173image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2001042297
Coefficient of variation (CV)9.000801914
Kurtosis230.1946943
Mean0.02223182241
Median Absolute Deviation (MAD)0
Skewness13.4342673
Sum3345
Variance0.04004170275
MonotonicityNot monotonic
2021-07-09T00:06:34.732520image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0147870
98.3%
12251
 
1.5%
4194
 
0.1%
2125
 
0.1%
316
 
< 0.1%
54
 
< 0.1%
ValueCountFrequency (%)
0147870
98.3%
12251
 
1.5%
2125
 
0.1%
316
 
< 0.1%
4194
 
0.1%
54
 
< 0.1%
ValueCountFrequency (%)
54
 
< 0.1%
4194
 
0.1%
316
 
< 0.1%
2125
 
0.1%
12251
 
1.5%
0147870
98.3%

noball_runs
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
149842 
1
 
602
2
 
9
5
 
6
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150460
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0149842
99.6%
1602
 
0.4%
29
 
< 0.1%
56
 
< 0.1%
31
 
< 0.1%

Length

2021-07-09T00:06:35.145345image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-09T00:06:35.276339image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0149842
99.6%
1602
 
0.4%
29
 
< 0.1%
56
 
< 0.1%
31
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0149842
99.6%
1602
 
0.4%
29
 
< 0.1%
56
 
< 0.1%
31
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number150460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0149842
99.6%
1602
 
0.4%
29
 
< 0.1%
56
 
< 0.1%
31
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common150460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0149842
99.6%
1602
 
0.4%
29
 
< 0.1%
56
 
< 0.1%
31
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII150460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0149842
99.6%
1602
 
0.4%
29
 
< 0.1%
56
 
< 0.1%
31
 
< 0.1%

penalty_runs
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
150458 
5
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150460
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0150458
> 99.9%
52
 
< 0.1%

Length

2021-07-09T00:06:35.659263image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-09T00:06:35.779969image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0150458
> 99.9%
52
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0150458
> 99.9%
52
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number150460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0150458
> 99.9%
52
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common150460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0150458
> 99.9%
52
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII150460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0150458
> 99.9%
52
 
< 0.1%

batsman_runs
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.222444504
Minimum0
Maximum6
Zeros61148
Zeros (%)40.6%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:35.880886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.594509362
Coefficient of variation (CV)1.304361349
Kurtosis1.695486794
Mean1.222444504
Median Absolute Deviation (MAD)1
Skewness1.596591837
Sum183929
Variance2.542460105
MonotonicityNot monotonic
2021-07-09T00:06:36.021857image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
061148
40.6%
155497
36.9%
417033
 
11.3%
29705
 
6.5%
66523
 
4.3%
3509
 
0.3%
545
 
< 0.1%
ValueCountFrequency (%)
061148
40.6%
155497
36.9%
29705
 
6.5%
3509
 
0.3%
417033
 
11.3%
545
 
< 0.1%
66523
 
4.3%
ValueCountFrequency (%)
66523
 
4.3%
545
 
< 0.1%
417033
 
11.3%
3509
 
0.3%
29705
 
6.5%
155497
36.9%
061148
40.6%

extra_runs
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06902166689
Minimum0
Maximum7
Zeros142264
Zeros (%)94.6%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:36.192999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3496673592
Coefficient of variation (CV)5.066052082
Kurtosis89.30223064
Mean0.06902166689
Median Absolute Deviation (MAD)0
Skewness8.165187813
Sum10385
Variance0.1222672621
MonotonicityNot monotonic
2021-07-09T00:06:36.354195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0142264
94.6%
17268
 
4.8%
2372
 
0.2%
4295
 
0.2%
5203
 
0.1%
357
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
0142264
94.6%
17268
 
4.8%
2372
 
0.2%
357
 
< 0.1%
4295
 
0.2%
5203
 
0.1%
71
 
< 0.1%
ValueCountFrequency (%)
71
 
< 0.1%
5203
 
0.1%
4295
 
0.2%
357
 
< 0.1%
2372
 
0.2%
17268
 
4.8%
0142264
94.6%

total_runs
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.29146617
Minimum0
Maximum7
Zeros53305
Zeros (%)35.4%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2021-07-09T00:06:36.543820image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile4
Maximum7
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.583240328
Coefficient of variation (CV)1.225924739
Kurtosis1.627705937
Mean1.29146617
Median Absolute Deviation (MAD)1
Skewness1.563859003
Sum194314
Variance2.506649935
MonotonicityNot monotonic
2021-07-09T00:06:36.697139image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
162205
41.3%
053305
35.4%
417250
 
11.5%
210253
 
6.8%
66486
 
4.3%
3597
 
0.4%
5326
 
0.2%
738
 
< 0.1%
ValueCountFrequency (%)
053305
35.4%
162205
41.3%
210253
 
6.8%
3597
 
0.4%
417250
 
11.5%
5326
 
0.2%
66486
 
4.3%
738
 
< 0.1%
ValueCountFrequency (%)
738
 
< 0.1%
66486
 
4.3%
5326
 
0.2%
417250
 
11.5%
3597
 
0.4%
210253
 
6.8%
162205
41.3%
053305
35.4%

player_dismissed
Categorical

HIGH CARDINALITY
MISSING

Distinct436
Distinct (%)5.9%
Missing143022
Missing (%)95.1%
Memory size1.1 MiB
SK Raina
 
134
G Gambhir
 
131
RG Sharma
 
129
RV Uthappa
 
128
V Kohli
 
118
Other values (431)
6798 

Length

Max length20
Median length9
Mean length9.425786502
Min length5

Characters and Unicode

Total characters70109
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique73 ?
Unique (%)1.0%

Sample

1st rowDA Warner
2nd rowS Dhawan
3rd rowMC Henriques
4th rowYuvraj Singh
5th rowMandeep Singh

Common Values

ValueCountFrequency (%)
SK Raina134
 
0.1%
G Gambhir131
 
0.1%
RG Sharma129
 
0.1%
RV Uthappa128
 
0.1%
V Kohli118
 
0.1%
KD Karthik117
 
0.1%
S Dhawan109
 
0.1%
PA Patel107
 
0.1%
Yuvraj Singh101
 
0.1%
DA Warner100
 
0.1%
Other values (426)6264
 
4.2%
(Missing)143022
95.1%

Length

2021-07-09T00:06:37.201104image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
singh279
 
1.8%
s235
 
1.5%
v220
 
1.4%
r215
 
1.4%
m211
 
1.4%
sharma205
 
1.3%
sk161
 
1.1%
patel156
 
1.0%
pathan156
 
1.0%
smith156
 
1.0%
Other values (606)13227
86.9%

Most occurring characters

ValueCountFrequency (%)
a7966
 
11.4%
7783
 
11.1%
i3387
 
4.8%
h3325
 
4.7%
r3170
 
4.5%
n3143
 
4.5%
e2850
 
4.1%
S2708
 
3.9%
l2480
 
3.5%
M1909
 
2.7%
Other values (44)31388
44.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter41078
58.6%
Uppercase Letter21226
30.3%
Space Separator7783
 
11.1%
Dash Punctuation22
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S2708
12.8%
M1909
 
9.0%
A1736
 
8.2%
R1722
 
8.1%
K1605
 
7.6%
P1508
 
7.1%
D1393
 
6.6%
J1120
 
5.3%
V960
 
4.5%
G950
 
4.5%
Other values (16)5615
26.5%
Lowercase Letter
ValueCountFrequency (%)
a7966
19.4%
i3387
 
8.2%
h3325
 
8.1%
r3170
 
7.7%
n3143
 
7.7%
e2850
 
6.9%
l2480
 
6.0%
s1711
 
4.2%
t1586
 
3.9%
o1525
 
3.7%
Other values (16)9935
24.2%
Space Separator
ValueCountFrequency (%)
7783
100.0%
Dash Punctuation
ValueCountFrequency (%)
-22
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin62304
88.9%
Common7805
 
11.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a7966
 
12.8%
i3387
 
5.4%
h3325
 
5.3%
r3170
 
5.1%
n3143
 
5.0%
e2850
 
4.6%
S2708
 
4.3%
l2480
 
4.0%
M1909
 
3.1%
A1736
 
2.8%
Other values (42)29630
47.6%
Common
ValueCountFrequency (%)
7783
99.7%
-22
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII70109
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a7966
 
11.4%
7783
 
11.1%
i3387
 
4.8%
h3325
 
4.7%
r3170
 
4.5%
n3143
 
4.5%
e2850
 
4.1%
S2708
 
3.9%
l2480
 
3.5%
M1909
 
2.7%
Other values (44)31388
44.8%

dismissal_kind
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct9
Distinct (%)0.1%
Missing143022
Missing (%)95.1%
Memory size1.1 MiB
caught
4373 
bowled
1382 
run out
755 
lbw
455 
stumped
 
243
Other values (4)
 
230

Length

Max length21
Median length6
Mean length6.276821726
Min length3

Characters and Unicode

Total characters46687
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowcaught
2nd rowcaught
3rd rowcaught
4th rowbowled
5th rowbowled

Common Values

ValueCountFrequency (%)
caught4373
 
2.9%
bowled1382
 
0.9%
run out755
 
0.5%
lbw455
 
0.3%
stumped243
 
0.2%
caught and bowled211
 
0.1%
hit wicket9
 
< 0.1%
retired hurt9
 
< 0.1%
obstructing the field1
 
< 0.1%
(Missing)143022
95.1%

Length

2021-07-09T00:06:37.614027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-09T00:06:37.765254image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
caught4584
53.1%
bowled1593
 
18.4%
run755
 
8.7%
out755
 
8.7%
lbw455
 
5.3%
stumped243
 
2.8%
and211
 
2.4%
hurt9
 
0.1%
hit9
 
0.1%
retired9
 
0.1%
Other values (4)12
 
0.1%

Most occurring characters

ValueCountFrequency (%)
u6347
13.6%
t5621
12.0%
a4795
10.3%
h4603
9.9%
c4594
9.8%
g4585
9.8%
o2349
 
5.0%
w2057
 
4.4%
d2057
 
4.4%
b2049
 
4.4%
Other values (11)7630
16.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter45490
97.4%
Space Separator1197
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u6347
14.0%
t5621
12.4%
a4795
10.5%
h4603
10.1%
c4594
10.1%
g4585
10.1%
o2349
 
5.2%
w2057
 
4.5%
d2057
 
4.5%
b2049
 
4.5%
Other values (10)6433
14.1%
Space Separator
ValueCountFrequency (%)
1197
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin45490
97.4%
Common1197
 
2.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
u6347
14.0%
t5621
12.4%
a4795
10.5%
h4603
10.1%
c4594
10.1%
g4585
10.1%
o2349
 
5.2%
w2057
 
4.5%
d2057
 
4.5%
b2049
 
4.5%
Other values (10)6433
14.1%
Common
ValueCountFrequency (%)
1197
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII46687
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
u6347
13.6%
t5621
12.0%
a4795
10.3%
h4603
9.9%
c4594
9.8%
g4585
9.8%
o2349
 
5.0%
w2057
 
4.4%
d2057
 
4.4%
b2049
 
4.4%
Other values (11)7630
16.3%

fielder
Categorical

HIGH CARDINALITY
MISSING

Distinct451
Distinct (%)8.4%
Missing145091
Missing (%)96.4%
Memory size1.1 MiB
KD Karthik
 
127
MS Dhoni
 
126
RV Uthappa
 
115
AB de Villiers
 
101
SK Raina
 
96
Other values (446)
4804 

Length

Max length21
Median length9
Mean length9.527845036
Min length5

Characters and Unicode

Total characters51155
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83 ?
Unique (%)1.5%

Sample

1st rowMandeep Singh
2nd rowSachin Baby
3rd rowSachin Baby
4th rowDA Warner
5th rowBCJ Cutting

Common Values

ValueCountFrequency (%)
KD Karthik127
 
0.1%
MS Dhoni126
 
0.1%
RV Uthappa115
 
0.1%
AB de Villiers101
 
0.1%
SK Raina96
 
0.1%
PA Patel84
 
0.1%
NV Ojha82
 
0.1%
RG Sharma79
 
0.1%
AC Gilchrist75
 
< 0.1%
V Kohli74
 
< 0.1%
Other values (441)4410
 
2.9%
(Missing)145091
96.4%

Length

2021-07-09T00:06:38.299269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
singh177
 
1.6%
m176
 
1.6%
r167
 
1.5%
sharma162
 
1.5%
ms161
 
1.5%
karthik141
 
1.3%
patel139
 
1.3%
s133
 
1.2%
kd127
 
1.1%
dhoni126
 
1.1%
Other values (573)9589
86.4%

Most occurring characters

ValueCountFrequency (%)
a5830
 
11.4%
5729
 
11.2%
i2589
 
5.1%
h2497
 
4.9%
r2316
 
4.5%
n2254
 
4.4%
e2027
 
4.0%
S1943
 
3.8%
l1805
 
3.5%
M1390
 
2.7%
Other values (45)22775
44.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter29980
58.6%
Uppercase Letter15283
29.9%
Space Separator5729
 
11.2%
Open Punctuation76
 
0.1%
Close Punctuation76
 
0.1%
Dash Punctuation11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a5830
19.4%
i2589
 
8.6%
h2497
 
8.3%
r2316
 
7.7%
n2254
 
7.5%
e2027
 
6.8%
l1805
 
6.0%
t1274
 
4.2%
s1261
 
4.2%
o1115
 
3.7%
Other values (16)7012
23.4%
Uppercase Letter
ValueCountFrequency (%)
S1943
12.7%
M1390
 
9.1%
A1286
 
8.4%
K1273
 
8.3%
R1186
 
7.8%
P1138
 
7.4%
D1074
 
7.0%
J761
 
5.0%
B728
 
4.8%
V709
 
4.6%
Other values (15)3795
24.8%
Space Separator
ValueCountFrequency (%)
5729
100.0%
Open Punctuation
ValueCountFrequency (%)
(76
100.0%
Close Punctuation
ValueCountFrequency (%)
)76
100.0%
Dash Punctuation
ValueCountFrequency (%)
-11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin45263
88.5%
Common5892
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a5830
 
12.9%
i2589
 
5.7%
h2497
 
5.5%
r2316
 
5.1%
n2254
 
5.0%
e2027
 
4.5%
S1943
 
4.3%
l1805
 
4.0%
M1390
 
3.1%
A1286
 
2.8%
Other values (41)21326
47.1%
Common
ValueCountFrequency (%)
5729
97.2%
(76
 
1.3%
)76
 
1.3%
-11
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII51155
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a5830
 
11.4%
5729
 
11.2%
i2589
 
5.1%
h2497
 
4.9%
r2316
 
4.5%
n2254
 
4.4%
e2027
 
4.0%
S1943
 
3.8%
l1805
 
3.5%
M1390
 
2.7%
Other values (45)22775
44.5%

Interactions

2021-07-09T00:06:08.026479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:08.320797image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:08.590456image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:08.870170image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:09.123969image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:09.404322image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:09.658222image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:09.930433image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:10.212738image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:10.495114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:10.777360image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:11.069852image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:11.331963image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:11.602182image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:11.866311image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:12.148373image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:12.440844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:12.722986image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:13.005407image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:13.287539image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:13.549773image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:13.829912image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:14.094111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:14.384331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:14.666629image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:14.920667image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:15.172629image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:15.434787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:15.686715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:15.938275image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:16.189303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:16.451294image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:16.722780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:16.980287image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:17.245962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:17.497919image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:17.942101image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:18.194001image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:18.446114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:18.718337image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:18.980454image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:19.260702image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:19.534997image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:19.797051image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:20.046905image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:20.311160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:20.573187image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:20.845479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:21.127216image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:21.399454image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:21.681720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:21.953962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:22.213821image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:22.477906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:22.740087image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:23.012137image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:23.301471image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:23.575514image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:23.856211image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:24.128379image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:24.382568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:24.644711image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:24.906810image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-09T00:06:25.187029image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-07-09T00:06:38.500778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-09T00:06:38.883931image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-09T00:06:39.539850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-09T00:06:39.943041image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-09T00:06:40.346315image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-09T00:06:25.651525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-09T00:06:26.755479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-07-09T00:06:27.626651image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-07-09T00:06:27.977842image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

match_idinningbatting_teambowling_teamoverballbatsmannon_strikerbowleris_super_overwide_runsbye_runslegbye_runsnoball_runspenalty_runsbatsman_runsextra_runstotal_runsplayer_dismisseddismissal_kindfielder
011SHRCB11DA WarnerS DhawanTS Mills000000000NaNNaNNaN
111SHRCB12DA WarnerS DhawanTS Mills000000000NaNNaNNaN
211SHRCB13DA WarnerS DhawanTS Mills000000404NaNNaNNaN
311SHRCB14DA WarnerS DhawanTS Mills000000000NaNNaNNaN
411SHRCB15DA WarnerS DhawanTS Mills020000022NaNNaNNaN
511SHRCB16S DhawanDA WarnerTS Mills000000000NaNNaNNaN
611SHRCB17S DhawanDA WarnerTS Mills000100011NaNNaNNaN
711SHRCB21S DhawanDA WarnerA Choudhary000000101NaNNaNNaN
811SHRCB22DA WarnerS DhawanA Choudhary000000404NaNNaNNaN
911SHRCB23DA WarnerS DhawanA Choudhary000010011NaNNaNNaN

Last rows

match_idinningbatting_teambowling_teamoverballbatsmannon_strikerbowleris_super_overwide_runsbye_runslegbye_runsnoball_runspenalty_runsbatsman_runsextra_runstotal_runsplayer_dismisseddismissal_kindfielder
1504506362RCBSH193CJ JordanSachin BabyMustafizur Rahman000000000NaNNaNNaN
1504516362RCBSH194CJ JordanSachin BabyMustafizur Rahman000100011NaNNaNNaN
1504526362RCBSH195Sachin BabyCJ JordanMustafizur Rahman000000202NaNNaNNaN
1504536362RCBSH196Sachin BabyCJ JordanMustafizur Rahman000000606NaNNaNNaN
1504546362RCBSH201CJ JordanSachin BabyB Kumar000000101NaNNaNNaN
1504556362RCBSH202Sachin BabyCJ JordanB Kumar000000202NaNNaNNaN
1504566362RCBSH203Sachin BabyCJ JordanB Kumar000000000CJ Jordanrun outNV Ojha
1504576362RCBSH204Iqbal AbdullaSachin BabyB Kumar000100011NaNNaNNaN
1504586362RCBSH205Sachin BabyIqbal AbdullaB Kumar000000101NaNNaNNaN
1504596362RCBSH206Iqbal AbdullaSachin BabyB Kumar000000404NaNNaNNaN